DdD 3 © Dl 
GENETICS 



HYPOTHESIS AND THEORY ARTICLE 

published: 14 July 2014 
doi: 10. 3389/fgene. 2014.00198 




Estimating directional epistasis 

Arnaud Le Rouzic * 

Centre National de la Recherche Scientifique, Laboratoire Evolution, Genomes, et Speciation, UPR 9034, Gif-sur-Yvette, France 



Edited by: 

Jose M. Alvarez-Castro, 
Universidade de Santiago de 
Compostela, Spain 

Reviewed by: 

Michael Kopp, Aix-Marseille 
University, France 
Janna Lynn Fierst, University of 
Oregon, USA 

'Correspondence: 

Arnaud Le Rouzic, Centre National 
de la Recherche Scientifique, 
Laboratoire Evolution, Genomes, et 
Speciation, Avenue de la Terrasse, 
Batiment 13, 91198 Gif-sur-Yvette, 
France 

e-mail: arnaud. le-rouzic@ 
legs, cnrs-gif. fr 



Epistasis, i.e., the fact that gene effects depend on the genetic background, is a 
direct consequence of the complexity of genetic architectures. Despite this, most 
of the models used in evolutionary and quantitative genetics pay scant attention to 
genetic interactions. For instance, the traditional decomposition of genetic effects models 
epistasis as noise around the evolutionarily-relevant additive effects. Such an approach 
is only valid if it is assumed that there is no general pattern among interactions — a 
highly speculative scenario. Systematic interactions generate directional epistasis, which 
has major evolutionary consequences. In spite of its importance, directional epistasis is 
rarely measured or reported by quantitative geneticists, not only because its relevance is 
generally ignored, but also due to the lack of simple, operational, and accessible methods 
for its estimation. This paper describes conceptual and statistical tools that can be used to 
estimate directional epistasis from various kinds of data, including QTL mapping results, 
phenotype measurements in mutants, and artificial selection responses. As an illustration, 
I measured directional epistasis from a real-life example. I then discuss the interpretation 
of the estimates, showing how they can be used to draw meaningful biological inferences. 
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1. INTRODUCTION 

An ability to understand and predict how genes affect morpho- 
logical, physiological, and behavioral characteristics is of crucial 
importance in biology. This also poses a considerable challenge, 
given the complexity of the genetic architecture of quantitative 
traits (Flint and Mackay, 2009). This complexity is not only due 
to the large number of genetic, environmental, and physiolog- 
ical factors involved, but also to their multiple and nonlinear 
interactions. In particular, it was noticed very early in the his- 
tory of genetics that the same genetic change often produces 
differing effects depending on the genetic background of the 
experimental species, population, or individual (Phillips, 1998; 
Wade et al, 2001; Phillips, 2008). The biological consequences of 
this phenomenon, known as "epistasis," have triggered a consid- 
erable amount of discussion. A whole century of active research 
in genetics and molecular biology has revealed the ubiquity of 
epistatic interactions associated with the organization of biologi- 
cal systems as networks of interacting molecules (Omholt et al., 
2000). However, we are still far from being able to integrate 
epistasis into a consensual, explicit, and predictive theoretical 
framework. 

In the classical analysis of genetic variance (Fisher, 1918), epis- 
tasis is considered as a source of noise. Most epistatic effects are 
not transmitted from parent to offspring, and therefore, are not 
involved in the response to natural or artificial selection. Epistatic 
variance — the contribution of epistasis to genetic variance in a 
population — can be calculated (Cockerham, 1954; Kempthorne, 
1954; Lynch and Walsh, 1998; Alvarez-Castro and Carlborg, 2007; 
Gjuvsland et al., 2007), but is almost meaningless in terms of pre- 
dicting the genetic properties of a population (Barton and Turelli, 
2004; Hansen, 2013; Alvarez-Castro and Le Rouzic, 2014), and 



may be negligible compared to evolutionarily-relevant additive 
genetic variance (Hill et al., 2008; Hemani et al., 2013). 

Another idea, which has become popular only in recent 
decades, is that epistasis matters because of its capacity to 
affect additive variance rather than because of its contribution 
to interaction variance (Cheverud and Routman, 1995). In an 
epistatic genetic architecture, the effects of alleles on the pheno- 
type depend on the genetic background. Accordingly, changes in 
the genetic background promoted by genetic drift (Goodnight, 
1987, 1988; Barton and Turelli, 2004; Turelli and Barton, 2006; 
Alvarez-Castro et al., 2009; Jarvis and Cheverud, 2009) or by 
selection (Carter et al., 2005; Hansen et al., 2006; Hallander 
and Waldmann, 2007; Le Rouzic et al., 2013) may reveal, hide, 
or revert allelic effects, and thus significantly affect the genetic 
variance. 

1.1. DIRECTIONAL EPISTASIS 

Epistasis can only exert a significant long-term influence on 
populations if individual epistatic effects do not tend to can- 
cel out each other, i.e., if a general pattern emerges. The most 
obvious pattern is the directionality of epistasis, the fact that 
genetic interactions can be biased toward either high or low phe- 
notype values. Estimates of directional epistasis allow to make 
useful predictions about the evolutionary potential of popula- 
tions: if additive genetic variance is a measure of evolvability 
(Houle, 1992; Hansen et al., 2011), then the directionality of epis- 
tasis is a measure of genetic architecture asymmetry, i.e., how 
evolvability is influenced by the direction of evolution. When 
epistasis is positive, evolution is easier in the direction of high, 
rather than low, phenotypic values (because additive genetic vari- 
ance tends to increase with the phenotypic value). In contrast, 
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negative epistasis favors evolution toward low phenotypic 
values. 

In spite of its predictive and descriptive value, directional epis- 
tasis is rarely reported for quantitative characters (Pavlicev et al., 
2010). This can be attributed to two main factors: (i) many (if 
not most) quantitative geneticists are used to measuring epista- 
sis via epistatic genetic variance, in spite of its marginal interest, 
and (ii) very few statistical or computational tools have been 
devised for measuring directional epistasis. The aim of this article 
is to present several methods for estimating directional epista- 
sis from genetic and phenotypic data, and to propose accessible 
statistical procedures for computing epistasis. Several such meth- 
ods will be illustrated from a real-life biological example, the 
genetic architecture of bodyweight in chicken, which displays a 
clear and consistent signal of positive epistasis. The data is based 
on a long-term artificial selection experiment on chicken body 
weight, and features (i) times series of the phenotypic response to 
selection, (ii) Quantitative Trait Locus (QTL) mapping data from 
a cross between the divergent lines, and (iii) minimal line-cross 
information (means of Fi and F2 populations) from the QTL 
setting. 

1.2. GENETIC MODELS 

In general, measuring the directionality of epistasis requires a 
model of genetic effects, i.e., a mathematical description of the 
relationships between the data (for instance, individual genotypes 
or phenotypes) and parameters to be estimated. The desirable 
properties for a "good" model of genetic effects depend on both 
the biological question and the nature of the data, and have 
resulted in rewarding (and sometimes conflictual) discussions 
(Cheverud and Routman, 1995; Hansen and Wagner, 2001b; Kao 
and Zeng, 2002; Yang, 2004; Zeng et al, 2005; Wang and Zeng, 
2006; Alvarez-Castro and Carlborg, 2007; Aylor and Zeng, 2008; 
Hansen, 2014). 

Genetic models can be conveniently divided into physiolog- 
ical and statistical models (Cheverud and Routman, 1995). In 
physiological (or functional: Hansen and Wagner, 2001b) mod- 
els, genetic effects are described relative to a reference genotype, 
which can be arbitrary (for instance, one of the parental strains 
in an intercross) or conventional (typically, the wild genetic 
background). Functional models are generally rooted in tradi- 
tional Mendelian genetics, in which a limited number of geno- 
types are experimentally generated and compared to reference 
strains. In contrast, statistical models quantify genetic effects in 
polymorphic populations across multiple genotypes. They are 
derived from the classical decomposition of genetic variance. 
Statistical genetic effects depend on allelic frequencies, and thus 
change when populations evolve; they provide a population- 
specific description of the genotype-to-phenotype map. In spite 
of obvious historical and conceptual divergences, it is sometimes 
possible to express both functional and statistical models in com- 
mon mathematical frameworks, and to transform functional into 
statistical estimates (and vice versa) by means of "change of ref- 
erence" operations (Hansen and Wagner, 2001b; Alvarez-Castro 
and Carlborg, 2007; Le Rouzic and Alvarez- Castro, 2008). 

With respect to epistasis, another useful distinction can be 
made between unidimensional and multidimensional models 



(Kondrashov and Kondrashov, 2001; de Visser et al., 2011). 
Unidimensional epistasis describes the general curvature of the 
genotype-phenotype map, and can be interpreted as the average 
effect of allelic substitutions that would be observed if all loci 
were exchangeable. Multidimensional epistasis accounts for the 
complexity of the genotype-phenotype relationship, by charac- 
terizing all pairs of loci that have a specific epistatic effect. While 
directional epistasis is unidimensional by definition, it can be 
measured based on either unidimensional or multidimensional 
models. 

Several models of directional epistasis will be reviewed below, 
starting from the multilinear model of epistasis, originally func- 
tional and multidimensional, which has been extended toward 
statistical and unidimensional formulations. I will then present 
and discuss alternative functional unidimensional models that are 
commonly used to measure epistasis for fitness, and show how 
they can be applied to quantitative characters. 

2. MULTILINEAR EPISTASIS 

2.1. THE MULTILINEAR MODEL OF GENETIC INTERACTIONS 

2.1.1. General framework 

The multilinear model of genetic interactions developed by 
Hansen and Wagner (2001b) extends and makes explicit the con- 
cept of directional epistasis in quantitative genetics, and makes 
it possible to build genotype-to-phenotype maps implementing 
directional epistasis. In its original multidimensional form, the 
model expresses the phenotype z as a multilinear function of the 
genotype G of an individual. For two loci, labeled "1" and "2" 
respectively, 

z G = z R + y h{ + y lR + yi R y2 R e n . (1) 

Genetic effects are measured relative to an arbitrary reference 
genotype for which yi = y 2 = 0, associated with a reference phe- 
notype zr. The effect of substituting the genotype of interest at 
locus 1 in the reference genotype _R is y\ R , and conversely, y% R is 
the effect at locus 2. When introducing the genotype of interest 
at both loci, in the absence of epistasis, the phenotype is expected 
to change by y\ R + y 2R . Any deviation from this expected additive 
outcome is attributable to epistasis. The originality of the multi- 
linear model is to assume that this deviation is proportional to the 
product of allelic effects, the proportionality coefficient su quan- 
tifying the strength and directionality of epistasis between loci 1 
and 2. 

The multilinearity arises from the fact that any change in 
the genotype of a locus when keeping the genetic background 
constant leads to a proportional change in the phenotype. For 
instance, Equation ( 1 ) can be reformulated as zq = a + jy\ R (with 
a = zr + y2 R and/ = 1 + y2 R £n)> illustrating that the genotype- 
phenotype map is always linear with respect to single genotypes 
(Figure 1). 

The epistatic coefficient, en, is expressed in terms of inversed 
phenotypic units (e.g., if the trait is measured in cm, e will 
be in cm -1 ), which is not intuitive and does not allow com- 
parisons between traits. Hansen and Wagner (2001b) suggest 
measuring epistasis by computing epistatic factors, /i = 1+72612 
and/2 = 1 + y\E\i, which quantify how much locus 1 is affected 
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by locus 2, and vice versa; f = 1 implies no epistasis, / < 1 
negative (antagonistic) epistasis, and / > 1 positive (synergistic) 
epistasis. 

2. 1.2. Statistical formulation 

The multilinear model is built as a functional model, since it 
defines genetic effects relative to a reference genotype, but a 
"change of reference" tool can be used to recompute genetic 
effects in any genotype or weighted combination of genotypes. 
When genetic effects are calculated relative to the average geno- 
type of a population, the marginal contributions of individual loci 
coincide with additive effects, and the model can be considered to 
be statistical. 

The multilinear model can also be used as a local approxima- 
tion on a non-multilinear genotype-phenotype map. There are 
various ways of generating genotype-phenotype maps, which are 
multidimensional mathematical functions g(y\,yi, . . . ,y„) that 
provide a deterministic phenotypic value for a series of genotypic 
values yi at n loci. Such mathematical maps are often defined 
in theoretical work intended to explain the evolution of popu- 
lations in complex genetic landscapes. Furthermore, even if the 
lack of large empirical genotype-phenotype data sets means that 
it is not yet realistic to attempt to do so, it is in principle possi- 
ble to fit smooth surfaces (such as multidimensional splines) to 
experimental measurements, and thus generate models of genetic 
landscapes that could be analyzed mathematically (and tested 
empirically). 

In any case, the multidimensional directional epistasis coef- 
ficients £y, which measures the curvature of the genotype- 
phenotype function between loci i and j, can be directly quantified 
as Eg = D^/DiDj, where D; = dg/dyi is the value of the first par- 
tial derivative of function g taken at the reference point, and 
D? = d 2 g/dyidyj is the mixed partial derivative (the curvature of 
the function g across both loci). This result illustrates the fact 
that the multilinear model is similar to a Taylor expansion of 



Positive epistasis 




FIGURE 1 I Multilinear genotype-phenotype maps for two loci, 
illustrating positive (synergistic) and negative (antagonistic) epistasis. 

yi and y^ represent the genotype values at both loci. The red lines 



the genotype-phenotype map that ignores intra-locus curvature 
(Hansen and Wagner, 2001b) (see Appendix I and Figure 2). 

2. 1.3. Composite directional epistasis 

The original multilinear model is multidimensional, as it involves 
as many e« parameters as pairs of loci. A unidimensional (and 



Negative epistasis 




highlight the multilinearity of the model: if the genetic background is kept 
constant, phenotype change depends linearly on the genotype at each 
locus. 




FIGURE 2 | The multilinear model (blue surface) is a local 
approximation of the interlocus curvature in a complex 
genotype-phenotype map. When the average genotype is chosen as the 
reference (red point), the multilinear approximation is able to predict the 
evolutionary properties of the population in a more precise way than the 
additive model. 
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Theoretical response to selection 
(positive directional epistasis) 
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FIGURE 3 | Top: Theoretical response to bidirectional constant selection 
under positive directional epistasis (s = +0.005, A Mo = 1). Bottom: the 
selection response is asymmetric, and the up-down average increases 
almost quadratically with time, the quadratic coefficient being eAjL . 



statistical) version of the model was proposed in Carter et al. 
(2005), with the composite directional epistasis coefficient e c cal- 
culated as the average e;< coefficient weighted by the additive 
genetic variance explained by each pair of loci: 

Both uni- and multi-dimensional versions of the model can be 
extended to higher orders of interactions and to multiple traits 
(Hansen and Wagner, 2001b). 

2.2. DIRECTIONAL EPISTASIS FROM PHEN0TYPIC DATA 
2.2. 1. Response to artificial selection 

Directional epistasis affects evolution, as it changes the amount of 
genetic variation available depending on the direction of pheno- 
typic change (Hansen et al, 2006). For instance, selection in the 
direction of positive epistasis tends to increase the frequency of 
synergistic genetic interactions, thus enhancing the effect of selec- 
tion. In contrast, selection in an antagonistic system decreases 
the genetic variance, and thus decreases the selection response. 
These effects can be experimentally observed, especially with bidi- 
rectional artificial selection responses, since they are expected to 
generate asymmetric responses in up- and down-selected lines. 

2.2.1.1. Theoretical framework. It is possible to model the 
expected impact of directional epistasis on genetic variance and 
to predict the difference between up- and down-selected lines as 
a function of the epistatic coefficients. Using a series of simplify- 
ing assumptions detailed in Appendix II, the selection response 
under a constant selection gradient after t generations is expected 
to be: 

log(l -2A M0 £f) 

\M - Mo 

2e 

% Mo + A^t + eAl/ + (3) 

where /xo is the initial mean phenotype, A M0 is the initial selection 
response (after the first generation), and e is the directionality 
of epistasis. The second part of the equation is the second- 
order Taylor approximation around t = 0, illustrating the linear 
selection response expected by the traditional breeder's equa- 
tion (A (!io f), and how directional epistasis appears as a quadratic 
term. Here, £ is the unidimensional directional epistasis, and thus 
corresponds to e c in Equation (2). 

A convenient way to estimate directional epistasis from bidi- 
rectional selection responses is to compute the up/down asym- 
metry through the average selection response, A(t) = j(up(t) + 
down(f )) (Figure 3). If epistasis is directional and relatively weak 
(A^e <g 1), A(t) changes approximately with f 2 , such that 
A(t) ~ e A 2 Q f 2 . It is thus possible to estimate A Mo as the slope at 
origin of the selection response, and then e through a quadratic 
regression on the average up/down response. Including the effects 
of e.g., inbreeding, linkage disequilibrium, or canalization, is pos- 
sible, but requires to numerically maximize the likelihood of 
complex models. This can be done with the software package sra 
for R, described in Le Rouzic et al. (201 1). 



2.2.1.2. Example: artificial selection on body weight. For more 
than 50 years, two chicken (Gallus gallus) lines were selected for 
high and low body weight at 56 days, respectively (Siegel, 1962; 
Liu et al., 1994; Dunnington and Siegel, 1996). The experiment is 
still ongoing; here, I consider the latest phenotypic results avail- 
able (54 generations, Dunnington et al., 2013). For simplicity, 
only the time series of mean phenotypes are considered, although 
some variance estimates were also available in this case. 

The impact of artificial selection was considerable (Figure 4). 
In the high-selection line, the body weight at 8 weeks rose 
from 800 g (male-female average) to 1650 g. In the low-selected 
line, the average body weight decreased to around 150 g, lead- 
ing to an impressive order-of-magnitude difference between 
high- and low-selected lines, well beyond the differences usu- 
ally observed between closely-related species, and spanning more 
than one third of the relative weight diversity in the entire 20 
Myr-old Galliformes order. The selection response was asymmet- 
ric: although the selection strength was identical in both lines, 
progress was slower in the low line. This can easily be attributed 
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FIGURE 4 | Top: male-female average experimental selection response on 
chicken bodyweight, digitalized from Figures 1, 2 in Dunnington et al. 
(2013). The initial selection response A,,,,, estimated by a linear regression 
over the first 20 generations (dashed segments), was 25.6 g per generation 
in the high line, and — 19.6g per generation in the low line. Bottom: 
quadratic regression on the up- and down-selection average, illustrating the 
cumulative effect of directional epistasis. The quadratic coefficient (which is 
an approximation of A^e), estimated by a non-linear, least-square 
regression, was 0.033 g per generation squared. 



to epistasis, given the expected differences in the genetic back- 
grounds of 1500 vs. 150 g birds. 

Using the procedure described in Equation (3), the strength 
of directional epistasis could be estimated from a quadratic 
regression over the high-low asymmetry. Estimating the initial 
selection response at around lA^J = 22.6 g per generation on 
average, directional epistasis is e — +6.6 x 10~ 5 g . Although 
apparently small, this figure is statistically significant and gener- 
ates cumulative effects on genetic architectures: Any phenotypic 
change corresponding to the initial (first-generation) selection 
response induces an increase of allelic effects of 0.15% in the 



high line, and decreased accordingly in the low line. The same 
allele is thus expected to display a >10% difference in the 
two extreme genetic backgrounds, representing weak, but non- 
negligible, epistasis. 

Of course, this estimate relies on major assumptions about the 
underlying process. Several genetic or non-genetic factors other 
than epistasis could affect the available genetic variance, and thus 
bias e. For instance, the quadratic approximation relies on the 
hypothesis that the selection gradient is constant over the entire 
time series, whereas in fact we know from e.g., Dunnington et al. 
(2013) that the selection intensity actually increases with time. 
Meanwhile, the reduced population size in the experiment nec- 
essarily generated a significant amount of inbreeding (even with 
a carefully-designed breeding scheme), which decreases the vari- 
ance due to genetic drift. However, these mechanisms are unlikely 
to generate misleading estimates of £, since (i) they affect both the 
up and down lines in the same way, and so cannot generate any 
asymmetry, and (ii) they tend to offset each other, as the selection 
strength increases while the genetic variance decreases. 

More worrisome is the possibility of uncontrolled natural 
selection in the low line. A fraction of the smallest birds appeared 
to be sterile or unviable, which could contribute to the slowing- 
down of the response. Such a mechanism could generate an 
asymmetric response, and thus spurious positive estimates of the 
epistatic coefficient. Nevertheless, this seems rather unlikely, given 
the behavior of the twelve relaxed selection lines presented in 
Dunnington et al. (2013). Indeed, when selection was stopped in 
both lines, the populations did not tend to evolve back to the orig- 
inal phenotype, as would have been expected if natural selection 
was preventing the population from responding to artificial selec- 
tion. The phenotypic data therefore seems to be compatible with 
a genetically-driven asymmetry, due to smaller allelic effects in 
low-weight chickens (i.e., positive epistasis). 

2.2.2. Line-cross analysis 

With the improvement in sequencing and genotyping technolo- 
gies, the phenotype-based methods developed and used by quan- 
titative geneticists for most of the 20th century to investigate 
genetic architectures without resorting to genotype data are cur- 
rently losing popularity. However, they are still both elegant and 
informative, especially when used to estimate general properies 
of populations such as unidimensional directional epistasis. One 
of the most powerful (and simple) of these biometric methods 
consists of crossing individuals or strains of interest in order to 
generate hybrid and backcross populations, from which the phe- 
notypic means and variances can be determined. The knowledge 
of the transmission mechanisms of genetic factors from parents 
to offspring makes it possible to disentangle the impact of addi- 
tive, dominance, and epistatic effects on the genetic differences 
between the original individuals (Lynch and Walsh, 1998 p. 205). 

A set of equations that can be used to compute additive, dom- 
inance, and directional epistatic effects from parental, intercross, 
and backcross populations are provided in Hansen and Wagner 
(2001b) (see Demuth and Wade, 2005, for an alternative model). 
Directional epistasis is unidimensional, and thus corresponds 
to the e c parameter of Equation (2). Below, a slightly different 
parameterization will be used, in which both parental populations 
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are separated by four additive effects, so that the model is identi- 
cal to a 2-locus QTL effect model in a diploid species. The model 
was set up so that genetic effects cancel out in the F 2 population, 
but a different reference point can be chosen (using the genetic 
effect matrices provided in, e.g., Alvarez-Castro and Carlborg, 
2007). Average phenotypes for both parental populations (Pi and 
P 2 ) and the first two intercross populations Fi and F 2 can be 
express as functions of four parameters: a reference /x (arbitrar- 
ily, the mean F2), additive and dominance effects A and D, and 
the directional epistasis coefficient e. 

Pi = 11 - 2A - D + e(A 2 +AD+ -D 2 ) 

4 

P 2 = u + 2A - D + e(A 2 -AD+ -D 2 ) 

4 

Pi = 11 + D+ -eD 2 (4) 
4 

Pi = fi. 

This simple model can be illustrated by the data from the exper- 
imental cross between the two chicken strains (Dunnington and 
Siegel, 1996; Marquez et al., 2010). In this experiment, the two 
generations of crossing necessary to generate a polymorphic F2 
population for QTL mapping makes it possible to sketch a min- 
imal line-cross analysis. Both parental populations as well as Fi 
and F2 individuals were raised in the same location, with the same 
food, and at the same density; their average weights at 8 weeks 
were 170 and 1412 for both parental chicken populations respec- 
tively, 650 g for the Fi, and 624 g for the F2. Both Fi and F2 are 
below the parental arithmetic average (791 g), suggesting the pres- 
ence of dominance and/or epistatic effects (Alvarez-Castro et al., 
2012). 

Although not perfect, this setting makes it possible to estimate 
up to four genetic parameters. Two models, with and without 
dominance, were tested, and gave very similar results (Equation 4 
and Table 1). The dominance effect, when estimated, was an 
order of magnitude below the additive contribution. Epistasis was 
positive, and of similar magnitude in both models. 

2.3. DIRECTIONAL EPISTASIS FROM QTL DATA 

Nowadays, data sets often consist of individuals in which both the 
phenotype and the genotype at loci of interest are known. This is 
for instance the case after the mapping of Quantitative Trait Loci 
(QTLs), either by linkage or association methods. Such data sets 
represent a valuable source of information about epistasis, and in 



Table 1 | Epistatic line-cross analysis of the chicken lines. 


Effect 


No dominance 


Dominance 


Reference fi 


637 g 


624 g 


Additive A 


310g 


318g 


Dominance D 




26 g 


Directional epistasis s 


LBxIO^g- 1 1 


.9 x ICHg- 1 



The full model (involving dominance) has no degree of freedom, so that 
statistical errors cannot be estimated. 



particular about multidimensional epistasis, which can hardly be 
estimated from phenotypic data. 

2.3. 1. Linear and multilinear models of genetic effects 

In most cases, QTL mapping procedures only focus on marginal 
(additive and dominance) effects, and do not explicitly consider 
genetic interactions (Carlborg and Haley, 2004). However, epis- 
tasis may be of major interest, both for improving QTL detection 
(Carlborg et al, 2003, 2004, 2006), and for the biological inter- 
pretation of the genotype-phenotype relationship (Malmberg 
and Mauricio, 2005; Le Rouzic et al, 2007, 2008). Mapping 
procedures accounting for epistasis generally rely on compo- 
nents of the interaction variance (Cockerham, 1954; Kempthorne, 
1954; Lynch and Walsh, 1998), which makes it necessary to 
estimate four genetic effects for each pair of loci (additive- 
by-additive, additive-by-dominant, dominant-by-additive, and 
dominant-by-dominant statistical effects). More recently, "vari- 
ance QTL" approaches have been proposed to map loci involved 
in various kinds of interactions, including gene-gene and gene- 
environment interactions (Ronnegard and Valdar, 2012). Until 
recently, there was no QTL mapping method based on direc- 
tional epistasis (Slatkin and Kirkpatrick, 2012), and estimation 
from genotype-phenotype data usually relied on model fitting on 
a predefined set of candidate loci (Cheverud et al, 2001; Le Rouzic 
et al., 2008; Shao et al, 2008; Pavlicev et al, 2010; Jarvis and 
Cheverud, 2011). 

The traditional genetic regression model, ignoring dominance 
(and dominance-related epistatic components), can be written as: 

P yi ,y 2 = I 1 + «iSi + a 2 S 2 + aauSu- (5) 

This model has 4 parameters for a pair of loci: \i is the inter- 
cept of the model (reference point), ai and a 2 are the additive 
effects for both loci, and aau — a traditional (and proba- 
bly unfortunate) notation, not to be confused with the product 
a x an — is the additive-by-additive effect. The S coefficients 
determine the genetic model, i.e., the weights of the genetic 
effects for each genotype. For instance, consider a haploid two- 
locus two-allele system with the reference genotype (arbitrar- 
ily) set to A1-B1. In the reference genotype, all S coefficients 
are set to 0 (/x, the reference point, thus corresponds to the 
intercept of the model). For genotype Ai52> Si = 0, S 2 = 1 
(because 1 effect a 2 has been added to the model, given the 
substitution of a B 2 allele), and S12 = 0. In genotype A 2 B 2 , 
Si = 1, S 2 = 1, and Sn = 1, reflecting the possibility of an 
interaction between A2 and B 2 alleles. Of course, different ref- 
erence points can be chosen, including mixtures of genotypes 
in specific frequencies (such as in the F 2 model, considering 
even allelic frequencies and Hardy-Weinberg proportions). The 
models becomes more complex with diploid genotypes (which 
include dominance effects), but the principle remains the same. 
Below, I used the model "NOIA" proposed by Alvarez-Castro 
and Carlborg (2007), which has some interesting statistical fea- 
tures. In particular, the model is orthogonal (provided there 
is no linkage disequilibrium) even if the population is not at 
Hardy-Weinberg proportions. In "NOIA," the S coefficients are 
stored as a genetic design matrix, and the model can be extended 
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(to include more alleles and/or more loci) using simple matrix 
algebra. 

It is possible to modify the above framework to estimate direc- 
tional epistasis. The strategy proposed by Le Rouzic and Alvarez- 
Castro (2008) is based on a non-linear, least-square regression, 
very similar to the framework proposed in Equation (4) for 
the analysis of line crosses: the model explicitly decomposes 
the epistatic parameter as a multilinear combination of additive 
effects, assuming that aa l} = a; x ot; x £;;: 

p yi>n = M + + a 2 S 2 + aia2£l2Si2- (6) 

This setting can easily be extended to account for dominance 
and higher-order epistasis (Alvarez-Castro and Carlborg, 2007; 
Le Rouzic and Alvarez-Castro, 2008; Pavlicev et al., 2010). When 
£y is estimated for each pair of loci, the model describes mul- 
tidimensional epistasis. There are two distinct ways to estimate 
unidirectional epistasis from this setting. The first method is to 
assume that e is identical between loci, i.e., replacing e t j by a 
constant £ in Equation (6). The second strategy is to estimate 
independent e« values for each pair of loci, and to compute 
the composite epistasis e c using Equation (2). This last strategy 
is more theoretically-grounded than the former, but it rapidly 
becomes impractical when the number of loci increases: the num- 
ber of interactions increases quadratically with the number of 
loci, which reduces the precision of pairwise interaction estimates. 

2.3.2. Application to QTLs for body weight 

Individuals from both the high and low chicken lines were inter- 
crossed at generation 46, to form the Fi and F2 populations 
described above. The 795 surviving individuals from the F2 pop- 
ulation were phenotyped for various characters and genotyped 
for 145 genetic markers on 25 chromosomes. The QTL mapping 
analysis identified 6 significant loci (four major loci and two of 
lesser effect). These significant loci combined explained around 
10% of the phenotypic variance, and strong epistatic interactions 
have been reported among them (Carlborg et al, 2006; Le Rouzic 
et al., 2007; Alvarez-Castro et al, 2012). For the sake of both 
simplicity and statistical power, only the four major QTLs are 
considered in the subsequent analyses. 

There are 24 second-order epistatic interactions between four 
loci (6 additive-by-additive, 6 dominance-by-dominance, and 12 
additive -by- dominance interactions). It is possible to estimate 
all of them using a model performing the traditional decom- 
position of genetic effects (here, I used the software package 
noia for R, Le Rouzic and Alvarez-Castro, 2008), but interpret- 
ing these 24 independent epistatic estimates is complicated: in 
spite of the large sample size (around 800 individuals), only 4 
(out of 24) epistatic estimates reached the 5% p-value thresh- 
old, and none remained statistically significant after correction 
for multiple-testing. There were no obvious signs of directional 
epistasis (11 positive estimates out of 24), even when focusing on 
additive-by-additive epistasis (3 positive estimates out of 6). 

Fitting a unidimensional multilinear model of epistasis leads 
to a much more conclusive analysis. The estimated constant s 
coefficient is positive (e = +0.057 g -1 ). The weighted compos- 
ite parameter, calculated from Equation (2), is also positive and 



of the same order of magnitude (s c = +0.020 g -1 ). The multi- 
linear model fits better than the traditional genetic-effects model 
with pairwise epistasis, outperforming it by 13.5 AIC units ( AAIC 
scores >10 can be considered to be conclusive, Burnham and 
Anderson, 2002). The multilinear model is also considerably 
better than models without epistasis (AAIC = 18.5). The undis- 
putable statistical superiority of the multilinear model translates 
into a substantial gain in explanatory power: the four-locus model 
without epistasis explains only 5.4% of the total phenotypic 
variance, while the multilinear model explains 7.8%. 

3. REGRESSIONS AGAINST THE NUMBER OF MUTATIONS 

While it is particularly rare to find estimates of directional epis- 
tasis for quantitative characters in general (Pavlicev et al., 2010), 
the sign of epistasis has been frequently estimated for fitness. The 
importance of directional epistasis for the logarithm of fitness 
has now been fully acknowledged by evolutionary biologists, as 
it affects the evolution of sex, recombination, mutation rates, and 
other related phenomena (Phillips et al., 2000). Here I will review 
two models frequently used in this context, and show how they 
can be modified to fit other quantitative traits. According to the 
previous definitions, these models are both functional and unidi- 
mensional, as they estimate directional epistasis with reference to 
the "wild type" with no mutations. 

3.1. MODEL DESCRIPTION 

A common way to estimate directional epistasis for (log) fitness 
is a "power" (or "multiplicative") model W = ccn^ (illustrated in 
Figure 6), where W stands for the log-fitness, a is the effect of a 
single mutation, n is the number of mutations, and fi measures 
directional epistasis. The model is based on the fact that the fit- 
ness of the reference individual or strain (n = 0) is 1, so that the 
intercept of the model is log(l) = 0 by construction. Fitness in 
single mutants (« = 1) is not affected by epistasis, which makes 
it possible to estimate a. Epistasis appears for n > 2, generat- 
ing deviations from linearity, fi > 1 represents positive epistasis, 
while fi < 1 stands for negative epistasis. The parameters of 
the model are usually estimated through non-linear regressions 
(least squares) or by non-linear generalized model approaches 
(maximum likelihood). 

An alternative setting is the quadratic model W = — (an + 
\fi'n 2 ) (Elena and Lenski, 1997; Kouyos et al, 2007) (for con- 
sistency with the literature, I have retained the same notation, 
although it should be noted that fi and fi' have different units, 
and fi' > 0 means positive epistasis). This latter model has some 
interesting theoretical properties associated with the Gaussian fit- 
ness function, and is more firmly grounded in classical population 
genetics theory (Charlesworth, 1990; Otto, 2007). 

Alternative parameterizations of the above models appear in 
the literature (e.g., estimating —a instead of a, or — 1 instead 
of fi, which provides a more straightforward interpretation of 
"positive" and "negative" epistasis). This framework is gener- 
ally used in two different experimental contexts: estimating the 
directionality of deleterious mutations (in which case, a < 0, and 
negative epistasis means that the deleterious mutations act syn- 
ergistically to decrease fitness), or estimating epistasis among the 
beneficial mutations accumulated during an artificial evolution 
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experiment (a > 0, and negative epistasis represents the antago- 
nistic effects of mutations) (Lenski et al., 1999; Wilke and Adami, 
2001; Maisnier-Patin et al, 2005). These symetric interpretations 
are arguably confusing, and the literature is not always consistent 
with regard to the association between the sign of directional epis- 
tasis and the synergistic or antagonistic properties of mutations 
(e.g., Szathmary, 1993). 

3.2. MODEL FITTING 

These models are clearly not suited for fitting traditional quan- 
titative genetics data, in which there are no "wild type" or 
"mutants." However, it is still possible to define the following con- 
tinuous function for a phenotype P, which behaves in a similar 
fashion as the power model: 



P(m) 



fj. — a\m\ 1 / 



if m > 0 
if m = 0 
if m < 0, 



(7) 



where m is a real number analogous to the "number of muta- 
tions" compared to the reference genotype, a and fi have the 
same meaning as in the power model (a is the average effect of 
the first mutation, and fi is the epistatic coefficient, with fi = 1 
standing for no epistasis). /x is the intercept of the model, i.e., the 
phenotype of the "reference genotype." This function is not dif- 
ferentiable at m = 0, but this is unlikely to affect the estimates. 
In order to obtain a proper analogy with traditional quantitative 
genetics, the mean F2 (same number of alleles from both parental 
lines) was chosen as the reference, m, the "number of mutations" 
parameter, thus stands for the number of additional "high-line" 
(H) alleles in a genotype compared to the reference. Considering 
the 4 significant QTLs, m = 0 for the reference (mean F2) geno- 
type (which has 4 low-line alleles and 4 high-line alleles), m = —4 
in the full low-line genotype (8 alleles from the low-line), and 
m = +4 in the full high-line genotype. An equivalent formulation 
(P{m) = + am + j fi' m 1 ) can also be defined for the quadratic 
model. 

Fitting the "continuous power model" of Equation (7) to the 
data by a non-linear, least-square procedure leads to the follow- 
ing estimates (estimate ± std. err.): a = 13.0 ± 5.8 g; fi = 2.18 ± 
0.41 (Figure 5). This is indicative of strong (and statistically sig- 
nificant) positive epistasis. The first allelic substitution in the 
reference background (average F2 individual) is thus expected to 
have an effect of 13 g, the second substitution will affect the phe- 
notype by 45.9 g (two "high" substitutions) or 4.9 g (two "low" 
substitutions). The epistatic effect is extreme for the fourth sub- 
stitution, which is predicted to have an effect of 124 g in the 
"high" direction (i.e., 10 times the estimated effect in the aver- 
age genetic background) but only 3 g in the "low" direction. The 
estimate of directional epistasis in the power model is heavily 
influenced by the few "extreme" genotypes: the 7 individuals with 
eight "H" alleles are all far above the average, which contributes 
to the excessive curvature of the genotype-phenotype relation- 
ship (Figure 5). Yet, epistasis is still present when all extreme 
genotypes (full homozygotes LL and HH) are removed, with an 
estimate of fi = 1.83 ± 0.50. 
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FIGURE 5 I The continuous version of the power model (Equation 7, 
solid line) and the quadratic model of epistasis (dashed line) applied 
to the chicken QTL data. The reference genotype contains as many "low" 
(L) alleles as "high" (H) alleles. The x-axis scales from —4 (LL genotype at 
all loci) to +4 (HH genotype at all loci). Intermediate numbers of mutations 
are due to genotype uncertainties when QTLs are not in total linkage 
disequilibrium with markers. 



Estimates from the quadratic model are a = 23. 1 ± 4.7 g, and 
fi' = 8.3 ± 4.0 g. In spite of the similar notation, fi' is not on the 
same scale as fi, and directional epistasis, although significantly 
positive, is smaller here (the two first allelic substitutions in the 
direction of higher phenotypes have an effect of 27.3 and 35.6 g 
respectively, vs. 19.0 g and 10.7 g for one and two substitutions 
toward lower phenotypes). 

4. DISCUSSION 

4.1. MODEL COMPARISONS 

Although they all provide an estimate of unidimensional direc- 
tional epistasis, the models reviewed in this paper have been 
designed to address different questions, and based in different 
sub-fields of population and quantitative genetics. 

The multilinear model provides an explicit description of epis- 
tasis between a set of loci, as in classical quantitative genetics 
models, and can be extended to fit to phenotypic data. On the 
opposite, both "regression" models suppose that epistatic pat- 
terns follow a general function. This incompatibility between 
models of directional epistasis for fitness and traditional quan- 
titative genetics models is probably an important factor in the 
lack of experimental measurements of directional epistasis for 
quantitative traits (Hansen and Wagner, 2001a; Pavlicev et al, 
2010). 

In addition to the fact that models are not designed to be 
applied to the same kind of data (the need to compare geno- 
types to an arbitrary wild type or the assumption of constant 
mutational effect size are difficult to overcome for quantitative 
genetics data), models also carry conceptual differences about the 
nature of epistatic interactions. For instance, the power model 
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necessarily involves highly complex epistatic interactions (Hansen 
and Wagner, 2001a). Quantitative genetics rely on linear models 
of genetic effects, in which interactions are calculated iteratively as 
the deviation between mutant phenotypes and the sum of lower 
effect interactions. The multilinear model follows this tradition, 
and is built as a sum of effects involving one locus (marginal 
effects), two loci (pairwise interaction effects), three loci, etc. 
For instance, second-order epistasis is the difference between the 
double mutant and twice the single mutant effect (Figure 6). 
In contrast, in the power model, there are as many interaction 
effects as there are mutations, which leads to very complex epista- 
sis. For most realistic values of (5 (0 < fi < 2), the second- and 
third-order interactions have opposite effects — in other words, 
if combining two mutations has antagonistic effects, combining 
three of them will have synergistic effects (the triple mutant is 
closer to additivity than predicted by the sum of second-order 
interactions). Moreover, the magnitude of high-order epistatic 
effects can represent a substantial fraction of lower-order effects 
(Figure 6), suggesting that combined mutant phenotypes are 
heavily impacted by the emergent properties of specific combi- 
nations of allelic substitutions, and thus difficult to predict from 
experimental results. 

This issue is avoided with the quadratic model, which is lim- 
ited to interactions between pairs of loci. However, this quadratic 
model implies that mutational effects can switch signs depending 
on the genetic background (sign epistasis). This property, which 
is sometimes perceived as undesirable when considering epistasis 



for fitness ( Wilke and Adami, 2001), could explain the persistence 
of alternative models. Another side effect of most unidimensional 
models of epistasis for fitness is that mutations are assumed to be 
of constant size. Relaxing this assumption significantly alters the 
evolutionary properties of the system (Butcher, 1995; Otto and 
Feldman, 1997), casting doubts on the operational meaning of fi 
(or fi') parameters. 

4.2. FULL-GENOME EPISTASIS 

For most of the 20th century, the concept of genotype-to- 
phenotype map was mostly virtual, and mainly used for the- 
oretical purposes. The possibility to access complete individual 
genomes for a reasonable price has not really been anticipated 
by quantitative geneticists, and we are now in the uncomfort- 
able situation of not being able to properly translate the massive 
amount of data collected experimentally into ground-breaking 
theoretical insights. Indeed, it is widely acknowledged that the 
revolutionary improvement in the quality and quantity of geno- 
typic information has not generated a proportional improvement 
in our ability to describe the genetic architecture of quantitative 
traits from genome-wide association studies. This "missing her- 
itability" problem might be partly due to our inability to detect 
properly epistatic interactions (Maher, 2008; Zuk et al, 2012; 
Hemani et al, 2013). 

Identifying interacting pairs of loci from a genotype- 
phenotype dataset schematically follows two strategies: (i) com- 
bine epistatic and marginal effects while mapping loci, with 
the hope to increase the genetic signal (Carlborg and Haley, 
2004), or (ii) first map loci based on their marginal effects, and 
estimate epistasis a posteriori between pairs of significant loci. 
Although theoretically elegant, the first strategy generally col- 
lapses with high-quality sequencing data because there are so 
many pairwise combinations to be tested that statistical noise 
overcomes the genetic signal by orders of magnitude. So far, 
the second strategy is thus unavoidable for estimating epistasis 
from high-throughput sequencing data. On the one hand, some 
epistatic loci will not be detected (in particular, those involved 
in sign epistasis, which may have no marginal effect). On the 
other hand, we know from Equation (2) that the impact of 
loci on the composite epistatic coefficient is weighted by their 
(marginal) genetic variance, meaning that the loci with no addi- 
tive effects will not affect directional epistasis. Consequently, esti- 
mating epistatic noise in general remains a complex task, and may 
require further statistical development. When it comes to direc- 
tional epistasis, focusing on major loci is much less problematic 
and ensures a proper estimation of this biologically meaningful 
parameter. 

4.3. CONSISTENCY ACROSS ESTIMATES 

This paper illustrates the estimation of epistasis directionality by 
several methods, using independent data describing the same bio- 
logical system. The various estimates are reported in Table 2. The 
units and the meaning of the epistatic coefficients differ according 
to the method. In order to facilitate the comparison, an epistatic 
factor /ioo is provided. This factor corresponds to the coefficient 
by which genetic effects change when body weight increases by 
(arbitrarily) 100 g. 
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FIGURE 6 | Illustration of high-order epistatic effects in the power 
model (here with negative epistasis, an 1 ' with a = 0.1 and ft = 0.8). 

The second-order epistatic effect is negative (the power model is always 
below the additive prediction), but the third-order effect is positive (the 
power model is always above the quadratic model). The sign of the 
interactions thus alternates when fi < 2, and their relative size does not 
decrease rapidly. As a result, the effect of combining several mutants 
cannot be properly inferred from simpler combinations — for instance, the 
prediction for four mutants is not much better for the second-order epistatic 
model than for the additive model, and can even be worse with more 
substitutions. 
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Table 2 | Summary of the directional epistasis estimates from 
different sources of data and different methods. 



Source of data 



Method 



Estimate 



'100 



Selection response Quadratic 

approximation 



Line cross 
QTL 
QTL 
QTL 



(Equation 3) 
Line cross analysis 
(Equation 4) 
Multilinear 
regression 
Power model 
(Equation 7) 
Quadratic model 



e = 6.6 x 10- 5 g _1 

e = 1.9 x 10" 3 g" 1 
e = 5.7 x ICHg- 1 
P = 2.18 
/3' = 8.3g 



1.007 

1.19 
6.7 
6.6 
2.0 



Estimates can be compared with the fjoo factor. 

Directional epistasis estimates are consistently positive, and 
in most cases statistically significant. This provides strong con- 
firmation that the genetic architecture of the weight differences 
between the high and low chicken lines is characterized by positive 
epistasis. However, the epistatic coefficients vary by several orders 
of magnitude in the different experiments; two categories of esti- 
mates can be defined: epistasis is strong when measured from 
the genotype data (increasing the phenotype by 100 g multiplies 
the allelic effects by 2 to almost 7), but weaker when measured 
from phenotype data (increasing the phenotype by 100 g increases 
allelic effects by 0.7 to 19%). 

These measures are not necessarily contradictory, because 
epistasis can be restricted to a specific subset of the genetic 
architecture. As the epistatic coefficient measures the "average" 
curvature of the genotype-phenotype map, it is strongly affected 
by the nature of the data (and more specifically, the span of the 
data in terms of number of loci and phenotype range), as it seems 
to be the case for the chicken bodyweight (Figure 7). The extreme 
epistatic factors measured from the QTL data can be attributed to 
several factors. The four large-effect QTLs are not a random sam- 
ple of loci, their effect is statistically inflated by detection bias (the 
Beavis effect: Beavis, 1994; Xu, 2003), and their strong epistatic 
interactions remain atypical (Carlborg et al., 2006). Their inter- 
action pattern involves sign epistasis (Le Rouzic et al., 2007), so 
that additive effects vanish in some genetic backgrounds: increas- 
ing a small effect by a large factor does not necessarily mean that 
the absolute interaction effect is huge. In any case, even if posi- 
tive epistasis is very strong for the 4 major loci, these QTLs only 
explain 7% of the total phenotypic variance, and the F2 popu- 
lation covers only 50% of the phenotype range of the parental 
lines. If directional epistasis is not a property of the whole genetic 
architecture, but merely reflects specific interactions between a 
few loci, data involving more loci and more genetic backgrounds 
would be expected to reveal less directional epistasis, which seems 
to be the case here with a striking regularity among the three 
independent data sources (Figure 7). 

5. CONCLUDING REMARKS 

Unidimensional directional epistasis measures how the properties 
of genetic architectures change with the phenotype. It has often 
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FIGURE 7 | Negative relationship between the span of the phenotypes 
in the data set and the directional epistasis coefficient. 



been confused with scaling. Scale transformation is a common 
operation in biology, often motivated by the need to make the 
data suitable for a particular statistical analysis (e.g., enforcing 
normality). Changing the scale of the phenotype measurement 
impacts on directional epistasis (Pavlicev et al., 2010), and it 
is possible to find an arbitrary scale transformation on which 
directional epistasis becomes negligible (or even is canceled out) 
in a data set. Applying such ad hoc mathematical operations to 
phenotypes prior to analysis could hardly be considered good 
practice. First, it has been repeatedly pointed out to biologists 
that, according to measurement theory, scales do actually have 
a meaning, and are thus not interchangeable (Wagner et al., 
1998; Houle et al., 2011). One of the best examples is fitness, 
which is essentially multiplicative (Wagner, 2010). Epistasis on fit- 
ness thus has to be measured as the deviation from log-linearity, 
which justifies models of directional epistasis presented above. 
Obviously, directional epistasis following the power model can- 
cels out on a log scale, but such a double log transformation 
would be meaningless, and should not be seriously considered. 
A second reason why scale change does not solve the problem 
of directional epistasis is that one should not necessarily expect 
consistent directionality. As exemplified by the chicken example, 
and illustrated in Figure 2, directionality is a local measure of 
the interlocus curvature of the genotype-phenotype map. It is 
thus likely that directionality could itself evolve as the phenotype 
changes (in the presence of third-order epistasis and higher- 
order interactions, directionality could even change when the 
phenotype remains constant). Therefore, comparing the proper- 
ties of genetic architectures across populations or species requires 
measuring directional epistasis on a common scale. 

Recent conceptual and theoretical advances have convincingly 
demonstrated that what matters in epistasis is not its direct con- 
tribution to genetic variation (interaction variance), but rather 
its propensity to (indirectly) influence the evolution of additive 



Frontiers in Genetics | Evolutionary and Population Genetics 



July 2014 I Volume 5 | Article 198 | 10 



Le Rouzic 



Estimating directional epistasis 



genetic variance. This propensity can be estimated by looking 
for specific patterns among epistatic interactions. The direction- 
ality of epistasis may be the most obvious, but other patterns 
are also emerging as candidate contributors to the evolvabil- 
ity of genetic architectures, such as the monotonicity of the 
genotype-phenotype relationship (closely linked to sign epistasis) 
(Gjuvsland et al., 2011, 2013), and the robustness or canalization 
of genetic architectures (Hermisson and Wagner, 2004; Draghi 
et al, 2010; Fraser and Schadt, 2010; Le Rouzic et al, 2013). 

In quantitative genetics and breeding, correctly describing 
epistasis can improve the prediction of selection responses. 
In evolutionary genetics, epistasis determines the structure of 
genetic diversity and variability. At the phylogenetic scale, direc- 
tional epistasis could contribute to biased anagenesis patterns and 
affect evolutionary trajectories. Most molecular mechanisms do 
not simply add up, and the genotype-phenotype relationship has 
to be curved to some extent. Is the observed curvature (quanti- 
fied with one or several of the methods described here) consistent 
with predictions from system-biology models? To what extent is 
it constrained by the physical properties of the phenotypic trait? 
Does it vary depending on the trait, on the species? Does it evolve 
rapidly? The importance of determining directional epistasis for 
a wide diversity of traits in many organisms has probably been 
underestimated in the past, but now appears to be a key toward 
obtaining a better understanding of the general properties of 
genetic architectures. 
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APPENDIX I: MULTILINEAR EPISTASIS ON A CONTINUOUS 
GENOTYPE-PHENOTYPE MAP 
TWO LOCI 

The multilinear model of Hansen and Wagner (2001b) is defined 
based on a reference genotype, and proposes a change-of- 
reference operation to recompute the genetic effects in a different 
genotype, assuming a multilinear genotype-phenotype map. In 
an arbitrary genotype-phenotype relationship, the multilinear 
model can be considered to be a local approximation of the mul- 
tilocus curvature, and epistatic coefficients can be calculated from 
Taylor polynomial coefficients. 

Let g(yi,y 2 ) be a continuous and differentiable (at least 
twice) two-dimensional Genotype-Phenotype function associat- 
ing a phenotype value P to any genotype combination {yi,y 2 ) 
at two loci. The gradient vector at a particular genotype Y = 
(Ti, I^) is D (Dj = dg{yi,y 2 )/dyj\r 1 ,r 2 )> ana the Hessian matrix 
is D 2 (D 2 . = d 2 g{y\, y 2 )/dyidyj\ri,r 2 )- The second-order Tailor 
series around this genotype T is: 

P(yi,y2)^g(ri,r 2 ) + D l (yi-r 1 ) + D 2 (y 2 -r 2 ) 



+ 1 -D 2 ll (y l -r 1 ) 2 + 1 -D 2 2 2 (y 2 -r 2 ) 2 
+ D 2 2 (y l -F l )(y 2 -F 2 ). 



(Al) 



Rescaling as y[ = D\(yi — Fi) and y' 2 = D 2 (y 2 — F 2 ) and 
neglecting the quadratic terms leads to a multilinear approxima- 
tion taking the genotype F as a reference point: 



D 2 

P(/i ,y' 2 )^g(F l ,F 2 )+y[+y' 2 + y\y' 2 — ^ 



(A2) 



where it appears clearly that the directionality coefficient of 
Hansen and Wagner (2001b) is Sij = D 2 j/D{Dj. The quadratic 

terms jD 2 x )/ 2 and \D 2 2 2 7 2 2 disappear from the equation as a 
consequence of the multilinear approximation. 

SEVERAL LOCI 

The previous approximation can be extended to several loci in a 
straightforward way: 



d 2 g 
dy,dyj 



I / U g I c> I 



(A3) 



Developing the third-order Taylor series and neglecting all 
quadratic terms, the third-order epistatic coefficients can be writ- 
ten as follows: 



Sijk 



0 8 I i?g 
dy,dyjdy k ^ T ' dy, 



I ^8 \ ^8 \ 

lr dy } lr Vk lT " 



(A4) 



The multilinear approximation can thus be easily extended to 
any number of loci and any order of epistasis, with the n th order 
epistasis coefficient being the « th mixed partial derivative of the 
genotype-phenotype function scaled by the product of the first- 
order derivatives of this function for all loci involved in the 
interaction. 



APPENDIX II: EFFECT OF DIRECTIONAL EPISTASIS ON 
ARTIFICIAL SELECTION RESPONSE 

The impact of directional epistasis on the response to direc- 
tional selection is rather complex to predict precisely for arbi- 
trary time periods (Carter et al, 2005). Nevertheless, useful 
approximations can still be derived by making realistic assump- 
tions about the properties of genetic architectures. For instance, 
Le Rouzic et al. (20 1 1 ) proposed a model that can be simplified as: 



Mt+l = Mt + V At Pt 



V At + 2p t sV 2 t 



(A5a) 
(A5b) 



Equation (A5a) is the traditional breeder's equation, formu- 
lated as in Lande and Arnold (1983), where Va is the addi- 
tive genetic variance, and fi the selection gradient, i.e., the 
slope of the regression between phenotype and relative fitness. 
Equation (A5b) approximates the impact of directional epistasis 
on additive variance, summarized by the directionality coeffi- 
cient e. 

This model requires 3 parameters: /iq, the initial phenotype, 
the initial additive variance V Ao , and the epistatic parameter e. 
Fitting the model by maximizing its likelihood for phenotype 
times series including means and variances provide convincing 
estimates of epistasis, especially when the data include bidirec- 
tional artificial selection (Le Rouzic et al, 201 1). 

Unfortunately, variance time series are not always available 
from historical data, because either they were measured but not 
reported in the corresponding publications, or simply because 
they were not computed, as only the mean phenotype was the cen- 
ter of interest. Moreover, fitting such a complex multidimensional 
non-linear model can be tricky, and requires significant computer 
programming input (and possibly having to solve numerical con- 
vergence issues). Proposing simpler formulas could therefore be 
helpful, as they may allow any biologist with basic statistical 
knowledge to report the strength of directional epistasis based on 
average phenotype data. 

The following calculation is based on several approximations, 
the main ones being that selection is expected to be constant 
(p t = fi), and that linkage disequilibrium can be ignored. If direc- 
tional epistasis is the only phenomenon affecting the selection 
response, the additive genetic variance is expected to change as in 
Equation (A5b). Approximating the discrete process by a contin- 
uous function leads to the ordinary differential equation = 
2PeV\, which can be solved as: 



Va, 



Va, 



1 - 2f3V Ao st 



(A6) 



Assuming that directional epistasis is not very strong 
(e/jVAo 1)> the expected phenotype at time t results 
from the product between the (supposedly constant) selection 
gradient fi and the cumulative change in V A , which can be 
calculated as: 



Mr = Mo + P 



fv Ar 

Jo 



dr 



Mo 



log(l -2f3V AQ et) 
2e 



(A7) 
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